Search CORE

78 research outputs found

Top-K Queries on Uncertain Data: On Score Distribution and Typical Answers

Author: Ge Tingjian
Madden Samuel R.
Zdonik Stan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Uncertain data arises in a number of domains, including data integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of top-k queries on uncertain data can be ambiguous due to tradeoffs between reporting high-scoring tuples and tuples with a high probability of being in the resulting data set. In this paper, we demonstrate the need to present the score distribution of top-k vectors to allow the user to choose between results along this score-probability dimensions. One option would be to display the complete distribution of all potential top-k tuple vectors, but this set is too large to compute. Instead, we propose to provide a number of typical vectors that effectively sample this distribution. We propose efficient algorithms to compute these vectors. We also extend the semantics and algorithms to the scenario of score ties, which is not dealt with in the previous work in the area. Our work includes a systematic empirical study on both real dataset and synthetic datasets.National Natural Science Foundation (Grant number IIS-0086057)National Natural Science Foundation (Grant number IIS- 0325838)National Natural Science Foundation (Grant number IIS-0448124

CiteSeerX

DSpace@MIT

Crossref

Tupleware: Redefining Modern Analytics

Author: Cetintemel Ugur
Crotty Andrew
Dursun Kayhan
Galakatos Alex
Kraska Tim
Zdonik Stan
Publication venue
Publication date: 30/07/2014
Field of study

There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems

arXiv.org e-Print Archive

CiteSeerX

Greenhouse: A Zero-Positive Machine Learning System for Time-Series Anomaly Detection

Author: Gottschlich Justin E
Lee Tae J
Metcalf Eric
Tatbul Nesime
Zdonik Stan
Publication venue: ScholarlyCommons
Publication date: 01/01/2018
Field of study

This short paper describes our ongoing research on Greenhouse - a zero-positive machine learning system for time-series anomaly detection

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Precision and Recall for Range-Based Anomaly Detection

Author: Gottschlich Justin E
Lee Tae J
Metcalf Eric
Tatbul Nesime
Zdonik Stan
Publication venue: ScholarlyCommons
Publication date: 01/01/2018
Field of study

Classical anomaly detection is principally concerned with point- based anomalies, anomalies that occur at a single data point. In this paper, we present a new mathematical model to express range- based anomalies, anomalies that occur over a range (or period) of time

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Precision and Recall for Time Series

Author: Alam Mejbah
Gottschlich Justin E
Lee Tae J
Tatbul Nesime
Zdonik Stan
Publication venue: ScholarlyCommons
Publication date: 01/01/2018
Field of study

Classical anomaly detection is principally concerned with point-based anomalies, those anomalies that occur at a single point in time. Yet, many real-world anomalies are range-based, meaning they occur over a period of time. Motivated by this observation, we present a new mathematical model to evaluate the accuracy of time series classification algorithms. Our model expands the well-known Precision and Recall metrics to measure ranges, while simultaneously enabling customization support for domain-specific preferences

arXiv.org e-Print Archive

ScholarlyCommons@Penn

An Optimal Relational Database Encryption Scheme

Author: Seny Kamara
Stan Zdonik
Tarik Moataz
Zheguang Zhao
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 04/03/2020
Field of study

Recently, Kamara and Moataz described the first encrypted relational database solution with support for a non-trivial fraction of SQL that does not make use of property-preserving encryption (Asiacrypt, 2018). More precisely, their construction, called SPX, handles the set of conjunctive SQL queries. While SPX was shown to be optimal for the subset of uncorrelated conjunctive SQL queries, it did not handle correlated queries optimally. Furthermore, it only handles queries in heuristic normal form. In this work, we address these limitations by proposing an extension of SPX that handles all conjunctive SQL queries optimally no matter what form they are in

Cryptology ePrint Archive

S-Store: Streaming Meets Transaction Processing

Author: Aslantas Cansu
Cetintemel Ugur
Du Jiang
Kraska Tim
Madden Samuel
Maier David
Meehan John
Pavlo Andrew
Stonebraker Michael
Tatbul Nesime
Tufte Kristin
Wang Hao
Zdonik Stan
Publication venue
Publication date: 01/01/2015
Field of study

Stream processing addresses the needs of real-time applications. Transaction processing addresses the coordination and safety of short atomic computations. Heretofore, these two modes of operation existed in separate, stove-piped systems. In this work, we attempt to fuse the two computational paradigms in a single system called S-Store. In this way, S-Store can simultaneously accommodate OLTP and streaming applications. We present a simple transaction model for streams that integrates seamlessly with a traditional OLTP system. We chose to build S-Store as an extension of H-Store, an open-source, in-memory, distributed OLTP database system. By implementing S-Store in this way, we can make use of the transaction processing facilities that H-Store already supports, and we can concentrate on the additional implementation features that are needed to support streaming. Similar implementations could be done using other main-memory OLTP platforms. We show that we can actually achieve higher throughput for streaming workloads in S-Store than an equivalent deployment in H-Store alone. We also show how this can be achieved within H-Store with the addition of a modest amount of new functionality. Furthermore, we compare S-Store to two state-of-the-art streaming systems, Spark Streaming and Storm, and show how S-Store matches and sometimes exceeds their performance while providing stronger transactional guarantees

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

PDXScholar (Portland State University)

A note from the editor

Author: Stan Zdonik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref